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Abstract 



We study the optimality of the minimax risk of truncated series estimators for sym- 
metric convex polytopes. We show that the optimal truncated series estimator is within 
O(logm) factor of the optimal if the polytope is defined by m hyperplanes. This rep- 
resents the first such bounds towards general convex bodies. In proving our result, we 
first define a geometric quantity, called the approximation radius, for lower bounding the 
minimax risk. We then derive our bounds by establishing a connection between the ap- 
proximation radius and the Kolmogorov width, the quantity that provides upper bounds 
for the truncated series estimator. Besides, our proof contains several ingredients which 
might be of independent interest: 1. The notion of approximation radius depends on the 
volume of the body. It is an intuitive notion and is flexible to yield strong minimax lower 
bounds; 2. The connection between the approximation radius and the Kolmogorov width 
is a consequence of a novel duality relationship on the Kolmogorov width, developed by 
\ utilizing some deep results from convex geometry (TJ [19j [8] . 

^ '. 1 Introduction 

■ In this paper, we study the minimax risk of estimators for symmetric convex polytopes. We 

show that for a symmetric convex polytope defined by m hyperplanes, the truncated series 
estimator, a special type of linear estimator, is within O(logm) factor of the optimal. 

In non-parametric statistics, the minimax risk of an estimator measures the worst case 
expected loss of the estimator for input coming from some subset X C R n (see Section 12.21 
for a formal definition). Tremendous work has been done on understanding the optimal 
minimax risk for various families of X. But it is usually very difficult to design the optimal 
estimator. The truncated series estimator is a family of linear estimator that simply projects 
an observation to a properly chosen subspace. Despite its simplicity, the truncated series 
estimator is surprisingly powerful and is shown to be nearly optimal for wide families of 
convex bodies. [T7] shows that such estimator is nearly optimal for ellipsoids. In [5j, it is 
shown that it is nearly optimal for the wider family of orthosymmetric and quadratically 
convex objects, including l p balls for p > 2. 

In this paper, we show that the power of truncated series estimator extends to the rich 
class of symmetric polytopes. Specifically, we show that for a symmetric convex polytope 
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defined by m hyperplanes, the truncated series estimator is within O(logm) factor of the 
optimal. Previously, such results have only been obtained for particular family of convex 
polytopes, such those corresponding to the Lipschitz condition p3] or satisfying certain iso- 
metric conditions [18J. As a motivating example, we discuss one application of our result in 
estimating values of a Lipschitz function. 

Example. One important estimation problem in the literature is the estimation of functions 
satisfying certain continuity or Lipschitz conditions from noisy measurements. Consider a 
univariate Lipschitz function / : [0, 1] — > R. Suppose that X{ = fifi) for i = 1, . . . , n, and we 
have measurements yi according to the model y% = Xi + Wi for some gaussian noise W{ . Then 
Lipschitz condition, with constant L, translates to the linear constraints: 

\xi+i - Xi\ < L \ti+i - ti\, for % = 1,. . . ,n - 1. (1) 

Now, we are interested in estimating X{ from y^. A key observation is that the vector 
x = (x±, • • • , x n ) falls in the set X, where 

X = {x : |x,+i — Xi\ < L \ — U\, for 1 < % < n — 1}. (2) 

Note that X is a symmetric convex polytope. 

When the sampling is uniform, i.e. ij = (i — l)/(n— 1), then X has a more special form 
of X = {x : \xi + \ — x%\ < L/(n — 1)}. In this case, previous work |144 [20] has shown that 
the best truncated series estimator is nearly optimal. As a consequence of our work, the 
truncated series estimator is nearly optimal (within O(logn) factor) for estimating Lipschitz 
function at arbitrary sample set {t\, . . . ,t n }. 

At the high level, the proof of our results follows a very simple strategy. We choose a 
family of "obstruction objects" for which we can obtain lower bounds of the minimax risk. 
Then we show a "duality" result that if X does not have a good truncated series estimator, 
then it will have to contain a "large" obstruction, and therefore no estimator can do well on X. 
Of course, the difficulty is in choosing the obstruction so that we can prove the corresponding 
duality result. Some natural obstructions include hyper-rectangles and Euclidean balls, for 
which we know very tight minimax lower bound. But they turn out to be too restrictive 
to allow a strong enough duality result. To overcome this difficulty, we consider a broader 
family consisting of objects which contain a "non- negligible" fraction of a "large" Euclidean 
ball; whence we are able to establish a desired duality relationship. 

More specifically, we first define a geometric measure for any set, called approximation 
radius, and then develop a lower bound technique which bounds the minimax risk of any 
body by its approximation radius. Intuitively, the approximation radius of an object X is 
the maximum radius of a ball with "non-negligible" volume fraction inside X. By refining 
the technique in [23], we can show that the minimax risk of X is asymptotically as large as 
that of the ball with X's approximation radius (see Theorem I3.2() . On the other hand, the 
minimax risk of truncated series estimator is determined by the Kolmogorov width of the 
object. Our bound is then derived by establishing a connection between the Kolmogorov 
widths and the approximation radius of the symmetric convex bodies (see Theorem 13. 4p . For 
the connection, we first derive a duality relationship between the Kolmogorov widths of X 
and its polar dual X° (see Theorem I3.3[) . by utilizing some results from convex geometry 
started in [lj. The Kolmogorov width of X° is then shown, by probabilistic arguments, to be 
intimately related to the approximation radius of X. 
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1.1 Related work 



There is a vast body of work on the minimax estimators and it is beyond the scope of this 
paper to survey all of them. We refer to |14^ [20| [TT] for comprehensive survey and will 
describe some work most relevant to this paper. Since we focus on the mean squared error 
(MSE), all the subsequent discussion is in the context of MSE. 

The minimax bounds have been developed for various families of convex bodies through 
intensive research in the past decades. Asymptotically tight bounds have been proposed for 
convex bodies that correspond to various continuity or energy conditions; the classes of Holder 
balls, Sobolev balls, and Besov balls. We refer to Chapter 2.8 in [20] for a comprehensive 
recount of the references. Despite these remarkable results, it is still largely unknown how to 
compute the minimax risk for an arbitrary convex body. Some previous work does attempt 
to deal with less specific objects (see [TS] and the references therein), but all the optimality 
results are under (fairly strong) isometric assumption about the objects. 

On the other hand, the truncated series estimator has a nice geometric interpretation 
and is related to the classical Kolmogorov width of the underlying space. In addition to its 
simplicity, [5] shows that it is asymptotically optimal for the classes of orthosymmetric and 
quadratically convex objects. This includes the class of diagonally stretched l v balls for p > 2. 
Present paper shows that the power of truncated series estimators also extend to the family 
of symmetric convex polytopes, as long as the polytope is defined by hyperplanes. 

To achieve our result, we develop a lower bound technique based on a geometric quantity 
which we dub approximation radius. Using Fano's inequality and the refinement developed 
in |23} 118] . we show that the minimax risk of a convex body is lower bounded by that of the 
ball with radius equal to the approximation radius of that body. Compared to the existing 
lower bound techniques, such as the Bernstein bound and the bound followed from considering 
the worst (typically discrete) distributions (see [HI [20] and [HE]), the approximation radius 
relies on a volume estimation and is both convenient to operate and flexible to provide strong 
lower bounds. 

One center piece in this paper is the connection established between the approximation 
radius and the Kolmogorov width. Towards this step, we use some results developed in Ba- 
nach space geometry which was initialized in [1] for investigating the invertibility of matrices 
with large "robust" rank and subsequently developed by [191 IB]- I n particular, we show a 
duality relationship between the Kolmogorov widths of a convex body and its polar dual 
body. Our result has a similar flavor to the classical duality in [13] but is tighter when the 
dimension gap is small. 

2 Preliminaries 

2.1 Notations and definitions 

For a vector x = (xi, ■ ■ ■ ,x n ) and a real number p > 1, denote by ||x[| p the ^ p -norm of x, 
and \\x\loo = maxj|xj|. When p is absent, it means £2 norm. Let B'^{x,r) denote the n 
dimensional £ p ball with radius r and center x. Whenever the center is at the origin, it is 
denoted by Bp(r). Also, we drop the superscript n, whenever the dimension is clear from 
the context, and suppress the argument r for r = 1. 

A set X C R n is called centrally symmetric (or simply symmetric) if for any x £ X, we 
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have —x £ X. For a set K, the (£2) radius of X is defined as in the following. 

rad(X) = max llxll. 

x€K 

For p > 0, and n > 1, the family J 7 ^™'™ is defined as 

J^' n = {X : X = {x : \Ax\ p < 1}, for A e R mxn } (3) 

In particular, when p = 00, J 7 ^'™ consists of symmetric convex polytopes defined by m 
hyperplanes. Throughout we consider bounded convex bodies. Our results easily extend to 
unbounded convex bodies, but the presentation would be cumbersome by including separate 
case analysis which does not add any new insight. 

2.2 Minimax risk 

Suppose we are given measurements of an unknown n-dimensional vector x, according to the 
model 

y = x + w, (4) 

where w £ M. n follows the normal distribution, w ~ N(0,cr 2 l), and x lies in X, a compact 
convex set in W 1 . The goal of the minimax estimation problem is to estimate vector x, with 
small error loss, and to evaluate the estimator under the minimax principle. 

For any estimator M : R n — > R n , the maximum mean squared error of M on (X, a) is 
defined as 

R(M,X,a) = maxE||x - M(y)\\ 2 , 
and the minimax risk of X is 

R(X, a) = min R(M, X, a) . 

M 

Estimators generally can be nonlinear function. We denote by Rl{X,ct) the minimax 
risk when M is linear. An alternative to the linear and nonlinear estimators is the truncated 
series estimator [5]. Truncated series estimator is obtained using projections M(y) = Py, 
with P 2 = P. Throughout this paper, projection always mean orthogonal projection. The 
minimax risk for truncated estimators is defined as 

Rt(X, a) = min max ||x — -Py|| 2 , 

where the minimum is taken over all the linear projections. Since truncated series estimators 
are linear, we clearly have 

R(X,a) <R L (X,a) <R T (X,a). 

It turns out that the minimax risk for truncated series estimators is completely charac- 
terized by the Kolmogorov /c-width dk of X, defined as [16] 

dk{X) = min max ||x — -Pfc^H, 
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where the minimum is taken over all /c-dimensional projections. Then, we have 

R T {X, a) = min d k (X) 2 + ka 2 . (5) 

k 

For the mean squared error considered in this paper, there is a more direct equivalent 
definition of the Kolmogorov fc-width under £2 metric. 

d k {X) = min rad(P(X)), 

where V k denotes all the /c-codimensional (or n — k dimensional) projections, and rad(if) 
denotes the £2 radius of K, defined as max xe x \\x\\2- Furthermore, 

iad(X) = d {X) >di(X)>...> d„(X) = 0. (6) 
2.3 Approximation radius 

We define the notion of approximation radius, a geometric measure of any convex body, which 
as we shall show, provides a lower bound for the minimax risk of the body. 

We use vol(X) to denote the volume of X and to denote all the k dimensional subspaces 
in W 1 . Assume X C R n is a convex body that contains the origin. For any r > 0, the volume 
ratio vr(X, r) of X is defined as 



vr(X, r) 



vol(XnB2(r) )\ 1/n 
vol(5?(r)) 



and the A;- volume ratio vr^(X, r) of X is defined as the maximum volume ratio over all 
the k dimensional central cut of X, i.e. 

Wk(X, r) = max vr(X D H,r) . 
Hen* 

Clearly, < vr(X,r) < 1. Further, 

Fact 2.1. If X is convex and contains the origin, then vr(X,r), and hence vr^.(X, r) for any 
k, is non-increasing in r. 

Proof. It suffices to show for any c > 1, vr(X, c ■ r) < vr(X, r). 

X n ££(c • r) = c(^-X n ££(r)) C c(X n SJ(r)) , 

where \X C X follows from the assumption that X is convex and contains the origin. 
Therefore vol(X n (c • r)) < c n vol(X D B'^r)). The claim follows immediately from the 
definition of volume ratio and the identity vol(i?2 (c • r)) = c n vol(i?2 d 

Central to lower bounding the minimax risk is the notion of approximation radius. 

Definition 2.2. For < c < 1, and integer 1 < k < n, the (c, k) -approximation radius of 
X, denoted by z c ^{X), is defined as the maximum r such that vik(X,r) > c, i.e. 

z C:k (X) = sup{r : vi k (X, r) > c} . (7) 

Note that if X contains the origin in its interior, then z c ^(X) is always defined for 
< c < 1. 
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2.4 Polar dual of convex bodies 

The connection between the Kolmogorov width and the approximation radius is established 
via the polar dual of the body. We state some basic facts about the polar dual body which 
we will need later. 

Definition 2.3. For any K C R n , denote by K° the (polar) dual set of K , 

K° = {y\ x ■ y — i f or Q tt x £ k} ■ 

If K lies on a lower dimensional subspace, K° is understood as the dual set on the lowest 
dimensional subspace that contains K . 

Fact 2.4. IfX = {x: \Ax\oo < I}, then X° = A T B™. 

Fact 2.5. Let H be a subspace of R n . Denote by Ph the projection on H. Then Ph(K°) = 

(H n K)°. 

Proof. We include a proof of this fact for the sake of completeness. We prove the different 
but equivalent identity Ph(K)° = H D K° . Let m = dim(H) and H = (hi, . . . , h m ) be an 
orthonormal basis of H. With a slight abuse of notation, we denote by H the matrix in 
M nxm that has hi as columns. Then Ph(x) = HH T x. Observe that for any x G R n ,y £ R m , 
(HH T x) ■ (Hy) = x T HH T Hy = x T Hy = x ■ (Hy). Hence 

Hy e P H {K)° ^ Vx G K, (HH T x) ■ (Hy) < 1 
«4> Vx G K, x ■ Hy < 1 
^ HyeHnK . 

□ 

3 Main results 

In this paper, we are interested in the minimax risk of the truncated series estimator for sym- 
metric convex bodies. Define (3(X) = max CT> o Rt(X, a)/R(X, a), and /3™' n = max^ g jr™,« P(X). 
Our main result is 

Theorem 3.1. If n = r2(logm), then ffi^f 1 < c • logm, where c < 2 • 10 s . Furthermore, 
/3™' n = J7( y/log mj log log m) . 

The lower bound follows immediately from previous works. As shown in [2] (Theorem 
3), for the unit t x ball X = B\, R T (X,l/^/n) = fl(l ) but R(X,\/y/ n) = 0(y/logn/n). 
Since £>" G F^' n where m = 2 n , we have f3^' n = U(y/log raj log log m) for n = Q(logm). 
In this paper, our main result is to provide a nearly matching upper bound of O(logm). 
The upper bound is the consequence of the following theorems: Theorem 13.21 lower bounds 
the minimax risk by the approximation radius; Theorems 13.31 13.41 together establish a lower 
bound on the approximation radius by the Kolmogorov width, which in turn upper bounds 
the minimax risk of the truncated series estimator. We assign concrete values to constants 
whenever possible. They are purely for presentation clarity and by no means represent the 
best possible constants. 
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Theorem 3.2. There exists a universal constant C = 2.46-10 such that for any < c* < 1, 

R(X,a) > Ccl max min {z c ^ k (X) 2 , ka 2 } . (8) 

k 

Theorem 3.3. For any convex centrally symmetric X C R n and any < k < n and 
< e < 1, 

4(i)c M ,(r)< Cl ^, (9) 

where a = 2/(^2-1) < 5. 

Theorem 3.4. Let X G T^ n . For any < c* < 0.2 and < A; < n, 



z Cst , k {X)>c n l-^- 1 (10) 
mm d n -k{X ) 

where 

c 2 = 0.4 v / ln(l/(2c,)). (11) 

The paper is mainly devoted to proving Theorems 13.21 [373], and !3.4| which together imply 
Theorem 13.11 We discuss some consequences of our results as well as some open questions at 
the end. 



4 Lower bounding the minimax risk 

In this section we prove Theorem 13.21 Our starting point is from the obvious lower bound 
for the Euclidean ball B^r). It is well known that R(B2(r),a) = fi(min(r 2 , no~ 2 )). We shall 
show that this is also true for any subset contained in if) with "non- negligible" (a fraction 
of £l(c n ) for some constant c > 0) volume. 

The proof is based on the information-theoretic bound established in |23| . In this tech- 
nique, the minimax risk is lower bounded by restricting to a maximal finite set of points 
{xi, • • • , x r } in X, separated from each other by at least an amount e in the loss metric. In- 
deed, e is the maximum separation distance such that the hypothesis {x\, • • • , x r } are almost 
indistinguishable. The Fano inequality is then used to relate this indistinguishability to K-L 
divergence. 

We proceed by defining an e-net and a (5-packing in a set S. 

Definition 4.1. A set N e C S is said to be an e-net for S if for any x S S, there exists a 
xq G N e , such that \\x — xo\\ < e. In addition, a finite set M$ C S is said to be an 5-packing 
in S, if for any x,x' G Ms, x / x' , we have \\x — x'\\ > 5. 

Proposition 4.2. For any set X, let N e (X) be any e-net for X and M$(X) be a 5-packing 
in X . Then, 

^'^-UJ I, 1 — )■ (12) 
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Proposition 14.21 is a direct application of the bound proved in [23] (Theorem 1). For the 
reader's convenience, we give the details of its derivation in Appendix lAl 

Note that the strongest lower bound in Eq. (|12p is achieved per the smallest e-net and 
the largest <5-packing of X. In the following, we will develop an upper bound on the size of 
the smallest e-net for X and a lower bound for the size of its largest <5-packing. 

Lemma 4.3. For any X C W 1 , r > rad(X) and e < r, there exists an e-net for X , with size 
at most (3r/e) n . 

The proof of Lemma 14.31 is deferred to Appendix [Bl 

Lemma 4.4. For any 5 > 0, there exists a 5-packing M$(X) with size at least y^B^(8)) . 

We refer to Appendix O for the proof of Lemma 14.41 
We are now in position to prove Theorem 13.21 

Proof. (Theorem I3.2[) For any k and c* consider the /c-dimensional central cross section Y 
of X that attains the approximation radius z Cttk . Let r k = mm.{z Ct ^ k ( y X),\fko~} 1 and Y k = 
y n B 2 (r k ). Clearly, R(X,a) > R(Y k ,a), since Y k C X. We will lower bound R(Y k ,a) by 
applying Proposition 14.21 and Lemmas 14.31 14.41 

Since rad(Yfc) < r k , by Lemma 14.31 for any e < r k , there exists an e-net of Y k , say N, 
with \N\ < (3r k /e) k . On the other hand, by Fact Ell vi k (Y,r k ) > vr k (Y, z c ^ k (X)) = c*. 
Therefore 

vol^Yfc) = vol fc (Y n B 2 (r k )) = vr k (Y, r k ) k vo\ k {B k 2 (r k )) > c k vol k {B${r k )) . 

Combining it with Lemma 14.41 there exists a 5-packing of Y k , say M, with \M\ > 
c k vo\ k (B k (r k ))/vol k (B k (5)) = (c*-r k /5) k . 

Choose S = (c*/a)r k , and e = r k , where a is a constant to be determined. Using the 
bounds on \N\ and \M\ in Proposition 14.21 we obtain 

R(Yt , a) > i (s^y u _ t^i±A±i) > i (s^y u _ !2^±i). (13) 

v ' y -4Va/V kloga I ~ 4 V a / V log a/ v; 

Maximizing the right hand side over a > 1, we get a = 12.89. Plugging in for a in Eq. f)13|) . 
we obtain R(Y k ,a) > Cc^r^, with C = 2.46 • 10 -4 . Since 1 < k < n is arbitrary, we have 

R(X, a) > m&xR(Y k ,a) > maxCc^r^ = Cc\ maxmin{z Cjij fc(X) 2 , ka 2 }. 



□ 



Invoking relation ([5]) and Theorem 13. 2^ in order to prove the near optimality of truncated 
series estimators for family J-^' n ', we establish some properties of the Kolmogorov width and 
explore its relation to the approximation radius. Before proceeding, we make a comparison 
between the proposed lower bound, and the one obtained by considering the hardest rectan- 
gular sub-problem. 



Relation to the hardest rectangular sub-problem. One technique in the literature [5, 
Q3] for lower bounding the minimax risk is to find the "hardest" box contained in the body 
(or compute the Bernstein width, defined as the side length of the largest cube enclosed in 



S 



the body) and apply the known lower bound for the box. The approximation radius can 
always be used to achieve at least the same asymptotical lower bound. 

Suppose that X contains a box with side lengths t%, . . . , r n . Then using the box bound [5], 
we have that R(X,a) = JlQ^ t 2 ct 2 /(t 2 + a 2 )) = min(r 2 , a 2 )). Now group r^'s as 

follows. The first group consists of t%, ■ ■ ■ ,77^, where k% is the smallest index such that 
X^=i min(rj, a 2 ) > a 2 . The seconds group consists of t^+i, ■ ■ ■ , Tfc 2 , where &2 is the smallest 
number such that min(rj, u 2 ) > a 2 , and so forth. Let k be the total number of 

groups. Firstly, note that Yliel min(-r? , a 2 ) is at most 2a 2 , for all groups /. Hence k = 
^(J2i rnin(r 2 /er 2 , 1)). Secondly, by construction, for all groups / (except possibly the last 
one), we have ^2 i£ j min(T 2 , a 2 ) > a 2 . Let k' be the number of these groups. For each of 



them, we can replace the corresponding face by its diagonal with length \ Yliel T i — a - This 



way we obtain an k! dimensional box with each side length at least a. Now it is straight 



forward to see that, z c ^'{X) = Q(Vk'a), and by Theorem 13.21 we get a lower bound of 

n(k'a 2 ) = fl(ka 2 ) = ft(£ im in(T 2 ,<T 2 )). 

5 A duality relationship for Kolmogorov widths 

We take a detour to establish the connection between the Kolmogorov width and the approx- 
imation radius. The connection is via a novel duality relationship between the Kolmogorov 
widths of X and those of its polar dual, as stated in Theorem l3.31 The proof is an application 
of some celebrated works in convex geometry [H [191 [8] . 

Definition 5.1. A set of vectors V = {vi,--- ,v s } is called 5-wide if for any 1 < i < s, 
dist(f i, span[Vy{t>j}]) > 5. 

The following proposition concerns an interesting property of 5-wide sets, and can be 
gleaned from [U \19\ [8]. For reader's convenience, we give the proof of this proposition in 
Appendix [Pi 

Proposition 5.2. For any 5-wide set V = {vi,-- - ,v s }, there exists a C {l,...,s} with 
M > (1 — e)s such that for any a = (aj)j<= a , 



with c = (V2- l)/2. 

Now we use the above proposition to prove Theorem 13.31 

Proof. (Theorem 13. 3p Write 5 = df c (X). Consider the k + 1 points V = {v±, . . . ,ffc+i} inside 
X which forms the largest k + 1 simplex. By the maximality of the volume of V, for any 

1 <*< Jfe + 1, 



Note that the vectors in V are affinely independent, and thus dim(V/{vi}) is either k or 
k — 1. Therefore, there exists an r-codimensional projection P such that Ker(P) = V/{v{}, 
and r € {k — 1, k}. Then 






dist(uj, span[Vy{i;j}]) = maxdist(x, span^/jf^}]) . 



(14) 



dist(«i,span[V/{«i}]) = \\P(vi)\\. 



(15) 
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Also, by Eq. (HU), we have 

||P(x)|| = dist(x,span[U/{^}]) < dis%, span[V/{«i}]) = ||P(^)|| , (16) 

for any x £ X. On the other hand, since d r (X) > dk(X) = 5 and X is centrally symmetric, 
there exist x,y £ X, such that ||P(x) — P(y)|| > 25. Hence 

\\P(vi)\\ > ±(\\P(x)\\ + \\P(y)\\) > \\\P{x) - P(y)\\ >6. 

Using Eq. (fT5|) . V is 5-wide. By Proposition l5.2| there exists a C {1, . . . , k} with \a\ > (1— e)k 
such that for any a = (aj)j g(J , 

W^ajVjW >cJ^8J2\ a j\, c= ^ 2 1 " ^ 
Let H = span[{?7j | i G <r}]. We claim that 

ffni3jin B%(c\/7Jk 5) . 

Consider Y = {X^go- "i^i I Sjgo- l a il — !}■ Since X is convex and centrally symmetric, and 
{^i} C ffnl, we have Y C PnX. Hence, it suffices to show that Hr\B%(cy/e/kS) C F. For 
any given x £ H Pi B^ic^J e/kS), let r* = max{r : rx G 1"}. Clearly, there exists a = (a)j g(T , 
such that, r*x = Yljea a j v ji ano - Sjgo- \ a j\ = 1- Hence, 

jeer v jeo- 



As x G i? n B%(cy/e/k5), we have ||x|| < c^Je/k5 and by Eq. (fT8j) . we obtain r* > 1. 
Consequently, i£7. Since x £ H D BV[{c\J e/kS) was arbitrary, we have 



By Fact ES 



HnB%(cy/e/k6)CY CHHX. (19) 

p H (x°) = (Hnx)° 

C (HnB^(c^/7/k5))° 

1 \ 



HnB 2 



c-J e/k 5 



Thus rad(Pff(X°)) < l/(c*fejk 5). Note that P# G Pn-dim(H)- Hence, 

d»-dim(fl)(*°) = min rad(P(X°)) < —±=-. (20) 
^e/-' n _ d i m ( H ) cye/fco 

Since dim(P) = |<r| > (1 — e)fc, recalling Eq. ([6]), d n _( 1 _ e ) fc (A°) < d n -dim(ii")0^°)- Taking 
ci = 1/c = 2/(y/2 — 1), we have 

d k {X)d n _ {1 _ e)k (X )< Cl ^-. (21) 

□ 



10 



Before we pass to the next section, we make a few remarks about the duality relationship 
stated in Theorem 13.31 

Remark 5.3. The dependence on k is the best possible. Consider X = B™, the unit l\ 
ball. Then X° = B^. It is easy to see that for any < k,k' < n, d k (X) > a/1 — k/n and 
d n - k ,(X°) > Vk 1 . When k < n/2 and k! = Q(fc), we have that d k (X)d n _ k t{X°) = £l(Vk). 
We do not know if the dependence on e is the best possible. But for the application in this 
paper, the dependence on e is not significant as it will be chosen as a constant. 

Remark 5.4. By using the maximum volume ellipsoid, it is fairly easy to show that for any 
< k < n, 

d k {X)d n ^x(X°) < Vn~. 

Consider the maximal enclosed ellipsoid ECI. By John's theorem UO\j . E C X C y/nE. 
Let the axes lengths of E be X% > A2 ■ ■ ■ > A n > 0. Then d k (X) < y/n\ k+ i since X C y/nE. 
On the other hand, by duality X° C E° . The axes lengths of E° are 1/A n > ... > I/A2 > 
1/Ai. So d n _ fc _!(X ) < l/\ k+l . Therefore, 4(X)d n _ fc _i(X°) <JR. 

However, proving the stated bound requires more advanced tool ( Proposition 1 5. . 

Remark 5.5. In [IS], a duality about G elf and numbers are given, where G elf and number c k 
is defined as 

c k (X) = min rad(# n X) . 

H :cod\m(H)=k 

Observe that c k < d k . To put it in a comparable form, in U3\/ . it is shown that there exists 
constant D > 0, such that for any < k < 1, c k {X)c^i_ K ^ n _ k _£,(X°) = 0(1/ k). This duality 
relation focuses on the duality gap, i.e. the product can be upper bounded by any constant. 
In our case there is a factor of \fk. However, the dimension gap, i.e. the difference between 
the dimension in one term and the co-dimension in the other term, is nn in this relationship. 
But ours is ek, much smaller when k is small. If we were to apply the duality in fj^/ . we then 
need to set k = ek/n, resulting in a bound of 0(n/(ek)), much larger than our bound when k 
is small. But the duality in ]13tf holds with high probability for a randomly chosen subspace, 
while it is not true for our bound. 

6 Main theorem 

In the previous section, we showed a relationship between the Kolmogorov widths of X and 
its polar dual X° . This easily translates to a relation between the Kolmogorov width and 
the radius of the largest Euclidean ball contained in X, which in turn gives us a bound on 
the minimax risk of the truncated series estimator. However, the bound is fairly weak due 
to the large duality gap of y/k. If it were some constant in place of Vk, we would already 
obtain the results we search after. Unfortunately per Remark 15.31 this dependence cannot 
be improved. In this section, we show that if X is defined by m hyperplanes, we can scale 
the largest Euclidean ball contained in X by the factor of \J kj log m such that the fraction 
of its volume inside X is still non-negligible, despite that the scaled ball may grow outside of 
X. This gives us the proof of Theorem 13. 44 and therefore of Theorem 13.11 
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Proof. (Theorem 13. 4p Let X be an arbitrary element in J^ n . Hence, there exists A G ]R mxn 
such that X = {x G W 1 : l-Ax^ < 1}. Let r = d n -k(X°). By definition of Kolmogorov 
widths, there exists a subspace H with dim(H) = k, such that rad(Pff(X )) = r. Let 
H = (h\, . . . , hk)i where h^s are any orthonormal bases of H. By Fact EH X° = A T B™, and 
P H (X°) = HH T A T BY l . As rad(P H (X )) = r, for any y G P H (X°), \\y\\ < r. Equivalently, 
for any w G B™, \\H T A T w\\ < r. Let i 7 = ^4// and write F = (fij) m xk- By duality of matrix 
norms, 



max ||.F t k;|| = max 

w£Bi l<i<m 



k 

2 



Since ||i^ r to|| < r for u; G 2?™, we have \J^2j fij — T > f° r an y 1 — * — m - 

Consider a random vector g = (gi, ■ ■ ■ ,gk) where g^s are i.i.d. standard gaussians. Denote 
by \i the probability density function of g, i.e., 

1 1 k 

via) = ^jfc72 ex P{-2E^}- 



i=l 



Let /x = V(2vr) fc/2 , r = ^2^^(1/(2^)), and Ml = P(|| 5 || < r). 

Using the standard tail bound for sum of random normal variables [TJ, for any constant 
c > 0, and for any 1 < i < m, 



{\(Fg)i\>c (]T/2)mm}<m- c2 / 2 . (22) 
\ i=i 



Since yYlj < t, we obtain 

p||(-Fg)i| > crVlnm| < mT c 2/2 . (23) 
Applying union bound for 1 < i < m, we obtain 

F{Fg G ctV^B™} = 1 - P( U™ x {|(F 5 );| > crVW}) > 1 - m 1 ^ 2 . (24) 
Consequently, 

F{Hg G crVl nm 

= F{\Fg\ OD < cWlnm} > 1 - m 1_c2/2 



(25) 



Assuming m > 2, and letting c = yjA — 2 log 2 Mi> we obtain P{-£/g G cr\/lnmJf} > 
1 — Ml/ 2 - Note that the function m(#) is decreasing in ||g||. Therefore, 

vol (crVh[rn~X n B£(r)) > — F^Hg G (crVhi^X n S|(r)) } 

> — (p(i?0 G cr\/hT^X) + P(# 5 G SjfM) ~ ( 26 ) 
Mo V / 

Mo v 2 y 2 Mo 

Here, B%(r) is the dimensional £2 ball in the subspace H. (Recall that dim(H) = k). 
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Fact 6.1. Let /j, = l/(2vr) fe / 2 , r = y/2kln(l/(2c*)), and m = F(\\g\\ < r). The followings 
hold true. 

(a) /i! >/ioe- r2 /2 vo i(^( r )). 
(6) // < c* < 0.2, i/ien 



Mi > l-2c*W2eln > 0.1. (27) 

2c* 



We refer to Appendix [E] for the proof of Fact 16.1 
Using Fact 16.11 (part (a)) in Eq. (f2"6"j) . we get 

vol (cr^/W X n £f(r)) \* ^ 1 ^_ r 2/ 2 fc _ 2c 



> — — c _r /2fe = — — > c (28) 
vol(B$(r)) j ~ 2^ k 2^- *' { ' 

Scaling the sets by factor 1 / (cr Vln m) in the left hand side of Eq. (|28|) , and using the definition 
of approximation radius, 



Zc >k {X) > r j== = Q2 J-*- . 1 , (29) 

cr Vln m V lnm d n _ fc (X°) 



where c 2 = (V2/c) i/lii(l/(2c*)). Using Fact O (part (6)), 



c= v /4-21og 2 /xi <3.3, (30) 



whence we obtain c 2 > 0.4-^/ln(l/(2c*)). This concludes the proof. □ 

With all these preparations, we can now prove the main theorem. 

Proof. (Theorem 13. ip As mentioned earlier, the lowerbound is implied by previous work. We 
only show the upperbound. Recall that d k {X) is non-decreasing in k. (see Eq. ©). Let 
k* = min{/c > l\d k (X) 2 < ka 2 }. (k* exists since d n {X) = 0). Consider the two cases below 
separately. 

• (k* > 1) : Invoking Eq. ©, R T (X,a) < d k *{X) 2 + k*a 2 . By definition of k* , R T (X,a) < 
2k* a 2 . Further, 

d k *(X) 2 + k*a 2 < d k *^(X) 2 + j-^-j (k* - l)a 2 

< 4*-i(*) 2 + 2d fc .-i (X) 2 = 34*_x(X) 2 . 
Hence, Rt(X,<j) < 3 min{4»-i(A) 2 , k*a 2 }. On the other hand, 



Ik* -I 1 

«o.,(**-i)/2(-a) > c 2 



2 lnm d n _ (fc ,_i )/2 (A ) ^ 



> - J, 4*-iPQ, 

2ci vmm 
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where the first inequality follows from Theorem 13.41 and the second one follows from Theo- 
rem [331 Applying Theorem 13.21 



R(X,a) > C^miii{z c . )(fc ._ 1)/2 (X) 2 ) ^ T i(7 2 } 



<?C 2 

> — mm 

4mm 

mm 



cl/cll)imn[d k ^ 1 {X) 2 X<y 2 } 



for d = (Ccj/12)mm(c 2 2 /ci,l). 



• (k* = 1) : Using Eq. ©, 

R T (X,a) < mm{do(X) 2 , d\ (X) 2 + a 2 } < min{rad(X) 2 , 2ct 2 }, (33) 

where we used the assumption k* = 1 in the final step. On the other hand, X contains a 
segment S with length rad(X). Using the result of [5], 



R(X,a) > R(S,a) = * > ^ min(a 2 , rad(X) 2 ). (34) 



Therefore, i?(X,cr) > (l/4)i? T (X, cr). 
Combining both cases, we have 

W (35) 

where 

M c . = i- = max(c 2 /c 2 , 1), C = 2.46 • 1(T 4 , ci = 2/(v / 2 - 1) , c 2 = 0.4Vln(l/(2c,)). 

Minimizing M Ct over < c* < 0.2, we obtain c* = 0.2 with M c „ < 2 • 10 8 . □ 

Remark 6.2. is essential that X is symmetric. Otherwise, we can take an orthant of B™ 
which has 0{n) faces and has large gap between Rt(X,o~) and R(X,o~). 



7 Discussions 



7.1 Applications to estimating Lipschitz functions 

The problem of estimating values of a Lipschitz function, at a set of sampled points, from 
noisy measurements is discussed in the introduction. Since the Lipschitz condition can be 
represented as linear conditions, Theorem 13.11 is widely applicable to such problems. For ex- 
ample, the function can be defined on any metric space, the sampling points can be arbitrary 
set of points, and the Lipschitz condition can be of higher order. As long as the corresponding 
linear constraints is bounded by for n samples, the approximation factor is within a 

small factor of O(logn) of the optimal. 
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7.2 Smooth convex bodies 

In the above, we have shown that (3^ n = O(logm). The celebrated Pinsker bound [T7] states 
that /3™' n = 0(1). What about (3™' n for other p'sl By plugging a = l/\/n in Theorem 3 
in [3J, we have that for 1 < p < 2, 0^' n = Vt((n/ log n) 1-p / 2 ). So we will not be able to obtain 
a similar bound to Theorem 13.11 when p < 2. On the other hand, we conjecture that similar 
upperbound holds when p > 2. 

Conjecture 7.1. For any p > 2, there exists a constant C = C(p), such that for any 
m,n>2, f3 p n ' n < Clogm. 

Define the distance d(X, Y) between two centrally symmetric convex body X, Y as the 
smallest c such that there exists a uniformly scaled orthogonal transformation F such that 
FY C X C cFY. We note that d(-, •) is similar to but different from the classical Banach- 
Mazur distance in which F is any linear transformation, and that log<i(-, •) is a pseudometric 
(non-negative, symmetric, and with triangular inequality). By straightforward arguments, 
P(X) < d{X,Y) 2 p(Y). Since d(B%,B%) = raVs-i/p and d(B^,B^) = n 1 ^, we have the 
following nontrivial bound. 

Corollary 7.2. For p > 2, /3™' n = 0(min(n 1_2//p , m 2/,p logm)). In particular, for p > 2, 
(3™' n = 0(^n logn). 



7.3 Tightness of the approximation radius bound 

We have used the approximation radius to lower bound the minimax risk of a convex body 
X. How tight is this bound? This paper has shown that it is at least within O(logm) factor 
of the optimal upper bound, and it is achieved by using the (rather limited) truncated series 
estimators. 

As discussed before, the approximation radius provides a lower bound at least as good 
as using Bernstein width, which is known to be asymptotically optimal for B™ when p > 2. 
In this section, we consider B™ for 1 < p < 2 and show that the lower bound of using 
approximation radius is very close to the minimax upper bound but does leave a small gap 
of factor of 6 ((log n) 1 ^/ 2 ). 

We start by upper bounding z Cj k(X). For any linear /j-dimensional subspace Hk, and 
B%{r) C Hk, we have 

Bl(r)nB;cH k nB;. (36) 

As it is proved in [12] , if 1 < p < 2, then vo^i/^ Pi B™) < vol(Bp). Using the formula for the 
volume of /c-dimensional £ p ball [22], we have 

vol(B fc ) = 2 fc — \ r Zpi=(±P) , (37) 

v p> r(| + i) \kVp) v ; 

where C p is a constant that depends on p. Hence, for any H^, 

f vol(Bl(r)nBZ) \ 1/k ((% J^_\ 1/k = Cp kV*-Vv 

{ vol(5j(r)) ) -\cfk k /Pr k C 2 r ' 1 ' 
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Therefore, z c>k (B%) < ^f- c k 1/2 ~ 1/p . For the lower bound of z Cjk (B£), choose H k to be 
one of the fc-dimensional principal subspaces. Then B™ n H k = Bp D k 1 l 2 ~ l l p B^- Hence, 
z c ,k(Bp) > Ik 1 / 2 - 1 ^. So, z c . k = ®{k l / 2 ~ l /P /c). Apply the lower bound in Theorem EJ 
and we obtain R{B^,a) = 0(max fc min(A; 1 ~ 2 /P, ka 2 )). When a < 1, we choose k ~ a p 
and obtain a lower bound of R(Bp,a) = U(a 2 ~ p ). By [3], the optimal upper bound for B r p l 
is R = 0(<7 2 ~ p (2 log no p ) l ~ p l 2 ) for (l/n) 1 / p <C <r <C \/l/logn. Hence the approximation 
radius bound leaves a gap of 0((log n) 1_p / 2 ). Actually, the largest gap we know of is \/log n 
by setting p = 1 in the above bound. 

7.4 Computational complexity 

We have shown that the truncated series estimator is close to optimal for symmetric convex 
polytopes. For the family of ellipsoids J 7 ™'™, the optimal truncated series estimator can be 
computed by using the singular value decomposition. However, computing the best truncated 
series estimator, or the Kolmogorov width, for symmetric convex polytopes, is a hard problem. 
When k = 0, do(X) is the diameter of X, and it is exactly the ^2-norm maximization problem 
considered in [2]. The problem is NP-hard. Further, it is shown in [2 J that it is hard to 
approximate within any constant factor unless P=NP. 

On the other hand, by using semi-definite programming (SDP) relaxation, one can com- 
pute 0{yJ\og m) approximation of the diameter [9l[T5], i.e. do(X). However, it is not known 
how to approximate d k (X) for k > 1. [21] showed that if the number of vertices of X is f, 
then SDP gives an 0(\/log v) approximation for d k - However, in our problem, the number 
of vertices of a symmetric convex body could be exponential in n. So the technique in [21] 
does not directly apply to our problem. 

8 Conclusion 

In this paper, we show that the truncated series estimator can achieve nearly optimal minimax 
risk for symmetric convex bodies defined by few hyperplanes. There are some outstanding 
open questions raised by this work. 

1. What is the best bound for /3^' n ? Our work leaves a gap of $7 ( yJ\og m / log log m) and 
0(log m). 

2. What is the best bound for f3 p n ' n for p > 2? We conjecture it is O(logm). 

3. How tight is the approximation radius bound for lower bounding the minimax risk for 
convex bodies? For l\ ball, it has a gap of 0(i/k>g n). This is the largest gap we know 
of. 

4. How to efficiently approximate the optimal truncated series estimator for any symmetric 
convex polytope? 

A Proof of Proposition I4.2I 

Consider any Mg(X)- packing in X. Let Ms(X) = {xi,--- ,x r }, and let u be a random 
variable uniformly distributed on the hypothesis set {xi,-- - ,x r }. Denote by M(y), the 
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estimation of x given the observation y. Define w = argmin 1<J<n ||M(y) — Xj\\. Since \\xj — 
x'j\\ > 5, we have w = j, if ||M(y) — Xj\\ < 5/2. Therefore, 

(g\ 2 g 
- max F{\\M(y) - Xj \\ > -\u = j} 
Z J l<]<n Z 

5 2 r 

^ tJ2 f( - w ^ i\ u = i) (39) 

r 3=1 

5^ 2 



>^-j F(w ± u). 

Let h(p) be the entropy function defined as 

h(p) = — plogp — (1 — p) log(l — p), for < p < 1. 

Denote by H{u\w) the posterior entropy of u, given w, and denote by I(u; w) the mutual 
information between u and w defined as 

I(u;w) = H{u) — H(u\w) = logr — H{u\w). 

Using Fano's inequality ([3], p. 39), 

F(w / u) log(r - 1) > H{u\w) - h{l/2) 

= H{u) - I(u; w) - log 2 (40) 
> log r — I(u; w) — log 2. 

We recall the definition of K-L distance between two probability densities p, q on a set f2, 
defined as [3], 

D K L{p,q) = / plog-dfi, (41) 
J Q. 

where \x is any measure on 0,. 

Using a property of mutual information, and its relation to K-L divergence ( [3J, p. 30, 
33), we have 

I(u; w) = I{u- Af fa)) < I(u; y) = E u {D KL (P(y\u), P(y))} 

< max D KL (P(y\ Xj ),P(y)) (42) 

l<j<r 

Let N € (X) be any e-net for X. Considering the uniform prior distribution on N e (X), we 
write, P{y) = l/\N e {X)\ z~2xeN4X) P{y\^)- Also, by definition of e-net, for any Xj, 1 < j < r, 
there exists Xj £ N e (X), with \\xj — Xj\\ < e. Hence, 

D K L(P(y\x s ),P(y)) =E{log , — } 

<E(iog r (yM i (43) 

= log|iV e (X)|+ J D(P(y|x i ),P(y|x i )) 



17 



Following the model (jH), y\xj ~ H(xj,a 2 \), and y\xj ~ N(xj,a 2 \). Using the definition of 
K-L distance (Eq. (|4ip ). after some simple algebraic manipulations, we have 

D{P{y\x j ),P{y\x j )) = ±\\ Xj - Xj \\ 2 < (44) 
Combining Eqs. (|4"2j) . (f4"3"|) . and (jUJ), we obtain 

I(u;w) < log|JV e (X)| + (45) 
Using Eq. (l39|) . (l40l) . and (J45]), we obtain the desired result. 



B Proof of Lemma 14.31 

Since r > rad(X), X C -B 2 (r). Hence, any e-net for B2{r) is also an e-net for X. We begin 
by covering B2(r) with a finite family of balls of radius e. Choose the sequence of centers 
pi , p 2 , ■ ■ ■ in such a way that 

i 

Pi+l i (J B 2(Pj,e). 
i=i 

When this is no longer possible, the sequence is terminated. Now the set P = {pi} is an e-net 
for B2(r). Meanwhile, note that the smaller balls -B 2 (pj,e/2) are all disjoint (since no two of 
the pi are within distance e of each other). In addition, i? 2 (pi, e/2) C B 2 (r) © B2(e/2), where 
© denotes the Minkowski sum. Therefore, 

\P\ vol(£ 2 (e/2)) = ^ vo1 (^(Pi, e/2)) < vol (s 2 (r) © £ 2 (e/2)) . (46) 

Evidently, B 2 (e/2) C l/25 2 (r), since e < r. Hence, B 2 (r) © £ 2 (e/2) C 3/2B 2 (r), and 
vol (_B 2 (r) © B 2 (e/2)j < (3/2) n vol (# 2 (r)Y Using Eq. ggj, we obtain 

(3/2)" vol (i? 2 (r)) /^.y 
|P| " vol(B 2 (e/2)) " V~J ' 



C Proof of Lemma 14.41 

Let Mg(X) denote the maximum size J-packing of X. By maximality of M§(X), any other 
point in X is within 5 distance of one of the points in Mg(X). Hence, 

XC (J B 2 (p,S), 
peMs(X) 

whence we obtain 
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D Proof of Proposition 15.21 

The proof is based in a crucial way on the following lemma proved in [8] . 
Lemma D.l. Letui,--- ,u s E M. n , \\u%\\ < 1. Define the set 

s 

E = {(5 j y j=1 :\\Y J ^ j \\ 2 <2s}. 

3=1 

Then, for every e G (0, 1), there exists a C {1, • • • , s} with \a\ > (1 — e)s, such that 

y/2-1 



PAE)^cV~e[-l,lV, c 



V2 



where the restriction map P a is defined as P a : (<5j')j=i — > (^j)jea- 

Since the set V = • • • , v s } is <5-wide, there exist y±, ■ ■ ■ , y s E W 1 , so that 



(vi,Vj) = l{i=j}, and 



(48) 



Let Ui = 5yi. Applying Lemma fD. 11 there exists a set a C {1, • • • , s}, with \a\ > (1 — e)s, 
and P(j{E) I) C\/e[— 1, 1] CT . Hence we can find (<5j)j =1 G -E, such that, 5j = cy / esign(a J ), for 
j G c Then, 



jr'Gcr J Sit i=l 

1 * 

j Git i=l 
1 1 S ^^11 ' ^|| ^ 



(49) 



< 



j'Gcr 

1 /2a II 



i=l 



civ e 



where the first step follows form Eq. 
result. 



jea- 

Rearranging the terms in Eq. 



implies the 



E Proof of Fact O 

Proof (Part (a)). 

m = m\g\\<r 



||*||<r 



2 / 2 dx 



vol(J3f(r)). 



(50) 



□ 
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Proof (Part (b)). We will first upper bound P(||<?|| > r) using a Chernoff Bound. 



P(|| 5 || > r) =P(e- 




E{e A £*=i»?} 



(51) 



Since $ are i.i.d. standard normal variables, it is easy to see that 



E{e- 



(52) 



Using Eq. ([52]) in Eq. ([5"T]) and substituting for r, we obtain 



1 



) 



k 



F(\\g\\ >r)< 



V1-2A 



(53) 



Minimizing the right hand side over A gives A = 1/2(1 + 1/(2 ln(2c*))). Notice that A > 0, 
for < c* < 0.2. Substituting for A in Eq. (|53|) gives 



where the last step follows from c* < 0.2, and k > 1. Now, \i\ = 1 — P(||g|| > r). The result 
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